On the Efficiency of Reductions in p-SIMD Media Extensions

نویسندگان

  • Jesus Corbal
  • Roger Espasa
  • Mateo Valero
چکیده

Many important multimedia applications contain a signifcant fraction of reduction operations. Although, in general, multimedia applications are characterized for having high amounts of Data Level Parallelism, reductions and accumulations are dificult to parallelize and show a poor tolerance to increases in the latency of the instructions. This is specially signifcantfor p-SIMD extensions such as MMX or AltiVec. To overcome the problem of reductions in p-SIMD ISAs, designers tend to include more and more complex instructions able to deal with the most common forms of reductions in multimedia. As long as the number ofprocessor pipeline stages grows, the number of cycles needed to execute these niultimedia instructions increases with every processor generation, severelr compromising performance. This paper presents an in-depth discussion of how reductions/accumulations are pelformed in current p-SIMD architectures and evaluates the performance trade-offs for a nearfuture highly aggressive superscalarprocessors with three different styles of p-SIMD extensions. We compare a MMX-like alternative to a MDMX-like extension that has Packed accumulators to attack the reduction problem, and we also compare it to MOM, a matrix register ISA. We will show that while packed accumulators present several advantages, they introduce artipcial recurrences that severely degrade pe rfortnance for processors with high number of registers and long latency operations. On the other hand, this paper demonstrates that longer SIMD media extensions such as MOM can take great advantage of accumulators by exploiting the associative parallelism implicit in reductions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Efficiency of Reductions in µ-SIMD Media Extensions

Many important multimedia applications contain a significant fraction of reduction operations. Although, in general, multimedia applications are characterized for having high amounts of Data Level Parallelism, reductions and accumulations are difficult to parallelize and show a poor tolerance to increases in the latency of the instructions. This is specially significant for -SIMD extensions suc...

متن کامل

Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements

Multimedia SIMD extensions such as MMX and AltiVec speedup media processing, however, our characterization shows that the attributes of current general-purpose processors enhanced with SIMD extensions do not match very well with the access patterns and loop structures of media programs. We find that 75-85% of the dynamic instructions in the processor instruction stream are supporting instructio...

متن کامل

Alternative Algorithms for Order-Preserving Matching

The problem of order-preserving matching is to find all substrings in the text which have the same relative order and length as the pattern. Several online and one offline solution were earlier proposed for the problem. In this paper, we introduce three new solutions based on filtration. The two online solutions rest on the SIMD (Single Instruction Multiple Data) architecture and the offline so...

متن کامل

Hardware Support to Reduce Overhead in Fine - Grain M edia Codes

The growing importance of media and media-like codes ha s caused general-purpose processors too incorporate SIMD-like extensions, such as MMX, SSE, and Alt iVec. While these media extensions do improve performance , significant parallelism in these codes remains u nexploited. In this paper, we propose and evaluate a programmable loopp engine (PLE for short) that executes media codes and kernels...

متن کامل

From SIMD to Micro-Grids

I. COMPILING TO SIMD PARALLELISM Most commodity microprocessors now support multi-media instructions. These instruction-set extensions are typically based on the Single Instruction-stream Multiple Data-stream (SIMD) model in which a single instruction causes the same mathematical operation to be carried out on many operands, or pairs of operands at the same time. The multi-media instructions on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004